Application of BIRCH to text clustering
نویسندگان
چکیده
This work represents a clustering technique, based on the Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) algorithm and LSA-methods for clustering large, high dimensional datasets. We present a document model and a clustering tool for processing texts in Russian and English languages and compare our results with other clustering techniques. Experimental results for clustering the datasets of 10’000, 100’000 and 850’000 documents are provided.
منابع مشابه
Application of modified balanced iterative reducing and clustering using hierarchies algorithm in parceling of brain performance using fMRI data
Introduction: Clustering of human brain is a very useful tool for diagnosis, treatment, and tracking of brain tumors. There are several methods in this category in order to do this. In this study, modified balanced iterative reducing and clustering using hierarchies (m-BIRCH) was introduced for brain activation clustering. This algorithm has an appropriate speed and good scalability in dealing ...
متن کاملAdvanced Split BIRCH Algorithm in Reconfigurable Network
The Balanced Iterative Reducing and Clustering using Hierarchies (BIRCH) has a disadvantage that it reduced the accuracy of the arbitrary shape clustering algorithm clusters, to this end a split improved BIRCH algorithm (AS-Birch) was put forward. Through the analysis of the reconfigurable network and a detailed analysis of application scenarios and functional requirements of business clusterin...
متن کاملA Joint Semantic Vector Representation Model for Text Clustering and Classification
Text clustering and classification are two main tasks of text mining. Feature selection plays the key role in the quality of the clustering and classification results. Although word-based features such as term frequency-inverse document frequency (TF-IDF) vectors have been widely used in different applications, their shortcoming in capturing semantic concepts of text motivated researches to use...
متن کاملخوشهبندی اسناد مبتنی بر آنتولوژی و رویکرد فازی
Data mining, also known as knowledge discovery in database, is the process to discover unknown knowledge from a large amount of data. Text mining is to apply data mining techniques to extract knowledge from unstructured text. Text clustering is one of important techniques of text mining, which is the unsupervised classification of similar documents into different groups. The most important step...
متن کاملBirch: an Eecient Data Clustering Method for Very Large Databases
Finding useful patterns in large datasets has attracted considerable interest recently, and one of the most widely studied problems in this area is the identiication of clusters, or densely populated regions, in a multi-dimensional dataset. Prior work does not adequately address the problem of large datasets and minimization of I/O costs. This paper presents a data clustering method named BIRCH...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2012